On Exploring Soft Discretization of Continuous Attributes

نویسنده

  • Hung Son Nguyen
چکیده

Searching for a binary partition of attribute domains is an important task in data mining. It is present in both decision tree construction and discretization. The most important advantages of decision tree methods are compactness and clearness of knowledge representation as well as high accuracy of classification. Decision tree algorithms also have some drawbacks. In cases of large data tables, existing decision tree induction methods are often inefficient in both computation and description aspects. Another disadvantage of standard decision tree methods is their instability, i.e., small data deviations may require a significant reconstruction of the decision tree. We present novel soft discretization methods using soft cuts instead of traditional crisp (or sharp) cuts. This new concept makes it possible to generate more compact and stable decision trees with high accuracy of classification. We also present an efficient method for soft cut generation from large databases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

OFP_CLASS: a hybrid method to generate optimized fuzzy partitions for classification

The discretization of values plays a critical role in data mining and knowledge discovery. The representation of information through intervals is more concise and easier to understand at certain levels of knowledge than the representation by mean continuous values. In this paper, we propose a method for discretizing continuous attributes by means of a series of fuzzy sets, which constitutes a f...

متن کامل

Utilizing multiple pheromones in an ant-based algorithm for continuous-attribute classification rule discovery

The cAnt-Miner algorithm is an Ant Colony Optimization (ACO) based technique for classification rule discovery in problem domains which include continuous attributes. In this paper, we propose several extensions to cAntMiner. The main extension is based on the use of multiple pheromone types, one for each class value to be predicted. In the proposed μcAnt-Miner algorithm, an ant first selects a...

متن کامل

Dynamic Discretization of Continuous Attributes

Discretization of continuous attributes is an important task for certain types of machine learning algorithms. Bayesian approaches, for instance, require assumptions about data distributions. Decision Trees, on the other hand, require sorting operations to deal with continuous attributes , which largely increase learning times. This paper presents a new method of discretization, whose main char...

متن کامل

Soft Discretization to Enhance the Continuous Decision Tree Induction*

Decision tree induction has been widely used to generate classifiers from training data through a process of recursively splitting the data space. In the case of training on continuous-valued data, the associated attributes must be discretized in advance or during the learning process. The commonly used method is to partition the attribute range into two or several intervals using a single or a...

متن کامل

Dynamic discreduction using Rough Sets

Discretization of continuous attributes is a necessary pre-requisite in deriving association rules and discovery of knowledge from databases. The derived rules are simpler and intuitively more meaningful if only a small number of attributes are used, and each attribute is discretized into a few intervals. The present research paper explores the interrelation between discretization and reduction...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004